Optical Character Recognition and Parsing of Typeset Mathematics ∗

نویسندگان

  • Richard J. Fateman
  • Taku Tokuyasu
  • Benjamin P. Berman
  • Nicholas Mitchell
چکیده

There is a wealth of mathematical knowledge that could be potentially very useful in many computational applications, but is not available in electronic form. This knowledge comes in the form of mechanically typeset books and journals going back more than one hundred years. Besides these older sources, there are a great many current publications, filled with useful mathematical information, which are difficult if not impossible to obtain in electronic form. Our work intends to encode, for use by computer algebra systems, integral tables and other documents currently available in hardcopy only. Our strategy is to extract character information from these documents, which is then passed to higher-level parsing routines for further extraction of mathematical content (or any other useful two-dimensional semantic content). This information can then be output as, for example, a Lisp or T EX expression. We have also developed routines for rapid access to this information, specifically for finding matches with formulas in a table of integrals. This paper reviews our current efforts, and summarizes our results and the problems we have encountered. This work was supported in part by NSF Grants numbers CCR-9214963 and IRI-9411334, and by NSF Infrastructure Grant number CDA-8722788. Present Address: University of California, San Diego, Computer Science and Engineering Department, 9500 Gilman Drive, La Jolla, CA 92093-0114

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optical Character Recognition for Typeset MathematicsBenjamin

There is a wealth of mathematical knowledge that could be potentially very useful in many computational applications , but is not available in electronic form. This knowledge comes in the form of mechanically typeset books and journals going back more than a hundred years. Besides these older sources, there are a great many current publications, lled with useful mathematical information, which ...

متن کامل

CS540 Machine Learning Clustering of Typeset Mathematical Symbols Using Spectral Methods and Shape Contexts

Optical character recognition (OCR) of natural languages, both typeset and handwritten, is successfully used today in a wide range of applications. OCR of mathematical expressions and mathematical symbols is not yet as advanced, however. This project demonstrates a method for recognising typeset mathematical symbols. The method involves using spectral methods to perform semi-supervised clusteri...

متن کامل

LTEX for the LayMaN

We address the problem of parsing handwritten mathematical expressions and converting them to LATEX format. Recognizing text in prose is in general a more tractable problem because contextual clues can be used. To leverage some similar bene ts in our implementation, we retain ambiguity in the character recognition procedure and use context to resolve these ambiguities during parsing.

متن کامل

Optical Character Recognition and Parsing of Typeset Mathematics1

There is a wealth of mathematical knowledge that could be potentially very useful in many computational applications but is not available in electronic form This knowledge comes in the form of mechanically typeset books and journals going back more than one hundred years Besides these older sources there are a great many current publications lled with useful mathematical information which are d...

متن کامل

Parsing TEX into Mathematics

Communication, storage, transmission, and searching of complex material has become increasingly important. Mathematical computing in a distributed environment is also becoming more plausible as libraries and computing facilities are connected with each other and with user facilities. TEX is a wellknown mathematical typesetting language, and from the display perspective it might seem that it cou...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996